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Theory; 

The  problems  inherent  in  multivariate  parameter  estimation  and  the 
identification  of  potential  outliers  have  been  documented  in  Gnanadesikan 
(1977,  Chapter  5)  and  Barnett  and  Lewis  (1979,  Chapter  6).  Procedures 
for  the  simultaneous  estimation  of  the  location  vector  y  and  the  covariance 
matrix  V  of  the  p-variate  Gaussian  distribution 

j  v  |  ~ 

f(x|y,V)  =  — exp{-*3 (x-y)Tv  ^x-y)},  (1) 

(2Tr)p/^  . 


given  a  random  sample  x  ,  x  ,  ...,  x  ,  have  been  discussed  by  Maronna 

~1  ~2  ~n 

(1976)  and  Huber  (1977,  Chapter  5) .  Our  procedure  stems  from  a  single 
underlying  primitive  principle  which  reduces  to  likelihood  as  a  special 
case  and  has  desirable  properties  not  possessed  by  competitors.  Let  c 
be  a  real  number.  The  self-critical  estimators  y  and  V  are  determined  from 
the  zeros  of 


n  f? 


ii  i.  /  3  log  f ,  ,  ~  _  » 

j*  i  - 


and 


?  fl  W)  — -g.f.i  _  i  3&1 

iil  Q  l  a  V  Q  9V  f 


0  , 


(2) 


(3) 


where 


Q(y,v?c) 


f  f1+c(x|y,V)dx 
J  *P 

{(1+C)P(2TT)CP  \v\Crh  . 


(4) 


The  arguments  of  f  and  Q  have  been  suppressed  in  (2)  and  (3)  for  notational 
convenience.  Equations  (2)  and  (3)  may  also  be  obtained  from  the  maximi¬ 
zation  of 


1 


2 


fc 

LC  =  c  {  c/(l+c)  -1 } 


with  respect  to  y  and  V  (Paulson  and  Delaney,  1982) .  It  may  be  shown 


lim  L  =  T  log  f.  (6) 

c  1 

c-K)  1=1 

so  that  the  self-critical  procedure  reduces  to  maximum  likelihood  as  a 
special  case.  These  procedures  may  also  be  developed  from  consideration 
of  the  generalized  mean.  We  have  termed  the  procedure  self-critical 
because  the  information  component  supplied  by  the  expressions  in  brackets 
{•}  in  (2)  and  (3)  are  "fed  back"  through  the  assumed  density  f  with 
degree  of  criticism  determined  by  c.  Observations  which  receive  relatively 
lew  final  weights  f^/Q,  a  scale-free  expression,  are  candidates  for  special 
examination  as  potential  outliers. 


Equations  (2)  and  (3)  reduce,  on  simplification,  to  the  joint  iterative 


forms 


j  *  y  w  .  x. , 

~m+l  . mi  -l. 
1=1 


Vi  *  (1+c)  l  "mi^i'V^i-V  ' 

1=1 


where 


exp{- f  (x  -umlT  V,"1!;  -M  )} 

"i "  ~ - — — — 191 

l  exp{-  f  (x  -u  )T  V  _i  (x  -y  ) } 
i=l  2  ~i  ~m  ~m  ~m 

for  m  =  0,  1,  ...,  m* .  Initial  estimates  y_  and  V  are  required  to  set  the 

^*0  -0 
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iterative  procedure  embodied  in  (7) ,  (8) ,  and  (9)  in  motion.  We  have 
frequently  used  w  instead  of  the  final  f./Q  to  assess  patterns  in  the 

m*i  l 

data  with  respect  to  the  multivariate  Gaussian  assumption  and  the  internal 
consistency  of  the  data. 

It  is  recommended  that  the  user  experiment  with  the  value  c  in 
examining  a  given  data  set.  The  estimators  y  and  V  are  increasingly  robust 


with  increasing  c  because  of  the  increasingly  self-critical  nature  of  the 

procedure.  If  for  several  values  of  c,  similar  estimates  of  y  and  V  are 

obtained  and  no  weight  w  .  .  is  relatively  small,  then  the  data  and 

m*i 

Gaussianity  are  self-consistent.  If  the  estimates  of  y  and  V  vary  sub¬ 
stantially  with  c  or  at  least  one  v;  lue  of  w  . .  is  small  relative  to  the 

m*i 

remainder,  then  those  observations  so  labeled  should  be  set  aside  for 

special  examination  vis-a-vis  the  remainder.  Since  the  w  . .  induce  a 

m*i 

virtually  automatic  ranking  of  the  observations,  they  are  very  useful  in 
determining  patterns  in  the  data.  We  have  typically  used  0£c£l  for  the 
multivariate  Gaussian  distribution  although  Paulson,  Presser,  and  Nicklin 
(1982)  have  found  use  for  negative  values  of  c,  as  well  as  values  of  c  in 
excess  of  unity.  The  magnitude  to  which  c  may  be  set  is  to  a  large  extent 
dependent  on  the  sample  size. 

The  estimators  y  and  V,  for  all  c>0,  are  location  and  scale  invariant, 


are  M-estimators,  are  jointly  asymptotic  normal,  have  closed,  bounded  and 
redescendent  to  zero  influence  functions,  and  are,  for  moderate  values  of 
c,  relatively  efficient  (Paulson  and  Delaney,  1982). 
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Numerical  Method: 

The  algorithm  begins  by  computing  initial  estimates  yQ  *  y,  the  usual 

__  A 

vector  of  sample  means,  and  VQ  =  v,  the  maximum  likelihood  estimator  of  the 
dispersion  matrix.  Formulae  (7) ,  (Sj ,  and  (9)  are  the  basis  for  the  self- 
critical  iterative  procedure.  Successive  estimates  of  y  and  V  are  gener¬ 
ated  until  the  norms  of  consecutive  estimates  are  less  than  a  user  specified 

small  amount,  ETA,  for  three  successive  iterations.  Generally,  we  have  used 
-4  -5 

ETA  =  10  or  10  .  Of  course,  the  number  of  iterations  required  for  con¬ 

vergence  will  depend  on  the  value  of  ETA  specified. 
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Structure; 

SUBROUTINE  SCEST (X,M,N,MDIM,NDIM,C , ETA, IMAX,MU , V,  WT  ,MUZ , VZ , IT, IFAULT) 


Formal  parameters 


X 

Real  array 

(M,N) 

input : 

data,  M  dimensions,  N  points 

M 

Integer 

input ; 

the  number  of  dimensions 

N 

Integer 

input : 

the  number  of  sample  points 

MDIM 

Integer 

input ; 

the  row  dimension  of  X  in  calling 
program 

NDIM 

Integer 

input: 

the  column  dimension  of  X  in  calling 
program 

C 

Real 

input : 

filtering  parameter 

ETA 

Real 

input : 

convergence  criterion  for  norms  of 
consecutive  estimates 

IMAX 

Real 

input : 

maximum  number  of  iterations  desired 

MU 

Real  array 

(M) 

output: 

final  location  vector  estimate, 

M  dimensions 

V 

Real  array 

(M,M) 

output : 

final  var-cov  matrix  estimate 

WT 

Real  array 

(N) 

output : 

final  weight  assigned  to  each  sample 
point  in  computation  of  estimates 

MUZ 

Real  array 

(M) 

output : 

initial  location  estimate;  vector 
of  means 

VZ 

Real  array 

(M,M) 

output : 

initial  var-cov  estimate;  MLE 

IT 

Integer 

output : 

the  number  of  iterations  used 

output:  0  if  no  errors  in  computation 

1  if  estimate  of  var-cov  matrix  is 
not  positive  definite 

2  if  accuracy  test  failed  on  inverse 
calculation 

3  if  convergence  criteria  not  met  in 
max  desired  iterations 


IFAULT  integer 
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Restrictions; 

The  number  of  dimensions  must  not  exceed  10,  unless  the  dimensions  of 
dummy  arrays  VINV,  TMP,  MUSAV,  VSAV,  and  VNORM  are  redeclared  (see  sub¬ 
routines  WEIGHT  and  CKNRM) .  This  procedure  requires  a  matrix  inversion 
subroutine.  As  written,  subroutine  WEIGHT  calls  on  the  IMSL  (International 
Mathematical  and  Statistical  Library)  subroutine  LINV2F  which  inverts  a 
matrix  with  specif iable  accuracy  =  IDGT.  The  use  of  alternate  matrix 
inversion  routines  would  require  rewriting  this  portion  of  the  subroutine 
WEIGHT. 

Precision; 

The  DOUBLE  PRECISION  version  is  recommended  due  to  the  iterative  nature 
of  the  algorithm  and  the  matrix  inverse  calculation. 

Time: 


Experience  with  an  IBM3033  computer  has  shown  that  the  procedure 
averaged  0.03  seconds  for  a  bivariate  sample  of  size  20  and  0.08  seconds 
for  a  trivariate  sample  of  size  25.  These  computations  were  done  using 
DOUBLE  PRECISION  and  a  convergence  criterion,  ETA  =  10  ^ . 
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SEAL*R  X  (5,  ISO)  ,MU  (5)  ,V  (5,5)  ,  WT  (15C)  ,  MUZ  (5)  ,VZ  (5,5) 

RE  AL*8  ETA,C,CSAV  (5)  ,TKP  (5)  ,TMP2  (5) 

DIMENSION  VF  (30) 

MDIM=5 
NDIM= 1 50 

DRIVER  INPUT:  N  S  SAMPLE  PTS<=  150,  M  *  DIMENSIONS  <  =  5 
IMAX  MAT  ITERS  ALLOWED,  NC  «  VALUES  0*  C  INPUT 
ETA  CCNVERG.  CRITERION  FOR  wOFMS 
READ  (5, 900)  N, M, IMA X, NC , ET A 
900  FORMAT (41?, E10. 5) 

C  DRIVER  INPUT:  CSAV  ARRAY  OF  C  VALUES 
PEAD(5,901)  (CSAV  (I)  ,1=1 , VC) 

<>01  FOPMAT  (5F5.3) 

C  DRIVER  INPUT:  VF  ARRAY  FOR  FORMAT  OF  DATA  CA"DS  =>nT  IN  () 

R  EA  D  (5,902) VF 
902  FORMAT  (30A4) 

C  DRIVER  INPUT:  X(I,J)  DATA  ONE  SAMPLE  POINT  PEP  CARD 

DO  10  J=1,N 

READ  (5, VF)  (X (I, J) ,1  =  1 , M) 

10  CONTINUE 

DO  100  KC=1 , NC 
C=CSA  V (KC) 

CALL  SCEST  (X  ,  M  ,  N  ,  MDT M,  ND  IM,C,  ET  A,  I  M  A  X  ,  MU,  V  ,  NT  ,  MUZ  ,  VZ  ,  TT  ,  I*7  AULT) 
WRITE  (6,910) 

°10  FOFMAT ( 1 H 1 , 1 7X , 2C H******* *********** **) 

IF  (IFAULT.  EQ.  0)  GO  TO  30 
C  NONNOPMAL  TERMINATION  MESSAGES 
GO  TO  (11,12,13) , IFAULT 

11  WRITE  (6 , R 1 1 )  IT 

911  FORMAT (1H9,46HESTIMATE  OF  VAF-COV  MATRIX  ALGORITHMICALLY  NOT 
&  19H  POSITIVE  DEFINITE, 4X,10HITSR  NO.  =,I5) 

GO  TO  30 

12  WRITE  (6,Q12) IT 

912  FOPMAT (1H0 ,45HACCURACY  TEST  FAILED  ON  INVERSE  CALCULATION 
RICH  ITER  NO.  =, 15) 

GO  TO  30 

13  WRITE  (6,q1  3)  I" AX 

913  FORMAT (1  HO , 32HCCNVERGENCS  CRITERIA  NOT  MET  IN  , 1 5 , 2X, 1 0 H ITER ATI°N S) 
C  NORMAL  TERMINATION  AND  OUTPUT 

30  WRITE  (6,930) C, ETA 

930  FORMAT (1H0,10X,34H***  SELF-CRITICAL  FSTI MATES  ***// 
S24HFILTERING  PARAMETER,  C  =,F7.4,3X, 

S20HCCNVSRG.  CRITERION  =,E12.5) 

C  COMPUTE  INITIAL  ♦  FINAL  CORRELATION  ESTIMATES 
C  STORE  IN  LOWER  OFF-DIAG  OF  VAR-COV  FOR  OUTPuT 
DO  35  1=1, M 
TMP2  (I)  =DSQRT  (VZ  (1,1)  ) 

IF  (IFAULT,  EQ.  0)  TMP  (I)  =DSQRT  (V  (I,  I)  ) 

35  CONTINUE 

k=n-i 

DO  45  1=1, K 
L=I*1 

DO  40  J=L, M 

V7.  (J,I)=VZ  (I,  J)  /  (TMP2  (I)  *TMP2  (J)  ) 

IF  (IFAULT.  EQ.  0)  V  ( J,  I)  =V  (T,  J)  /  (T*P  (T)  *TMP(J)) 

40  CONTINUE 
45  CONTINUE 

C  PRINT  OUT  INITIAL  ESTIMATES 


A- 2 


WRTTE  (.6,931) 

«31  FORMAT (1H0,10X,17HINITIAL  ESTIMATES) 

DO  50  I®  1 ,  M 

WRIT£(6,932).MUZ(I),  (VZ(I,J)  ,J*1,S) 

932  FORMAT  (1H0,P10.  5,8X,5  (P10.  5,3X) ) 

50  CONTINUE 

IF  (IT.  £Q.  1.  AND.  IFAOLT.  GT.  0)  GO  TO  100 
C  PRINT  OOT  PINAL  SC  ESTIMATES 
WRITE  (6,933) IT 

933  POR«AT(1RO,10X,26HFINAL  ESTIMATES  ITEP  NO.  =  ,I5) 

DO  60  1=1,8 

WRITE  (6,9  32)  Ml!  (I)  ,  (V(T,J)  ,J=1,R) 

60  CONTINUE 

C  PRINT  OUT  FINAL  SC  WEIGHTS  POP  EACH  SAMPLE  POINT 
WP ITE  (6  934) 

9’4  PCF.M  AT  (1H0 , 3X,  3HNO.  »3X, 6HVEIGHT,  10 X,  1  2HS AM PLS  POINT) 
DO  70  J=1 ,  M 

WRITE  (6,93  5)  J,  WT  (J)  ,  (I  (I,  J)  ,1=1 »  N) 

93  5  FORMAT  (1H0,I5,3X,E12.  5,3X,5F10.5) 

70  CONTINUE 
100  CONTINUE 
STOP 
END 


A- 3 


-J 


C 

c 

SUBROUTINE  SCEST  (X  ,  M,  N,  MDIM,  NDIM,C  ,  ET  A  ,  I M  AX,  MU  ,  V,  WT  ,  107  ,  V7  ,  IT,  I  FA  tlLT) 
C 

C  CALCULATES  S3LF-C3ITIC AL  ESTIMATES 
C 

REAL*9  X  (MDIM,  NDI M)  ,«U(MDIM)  ,  V  {MDIM,  MDIM)  ,  WT  (NDIM) 

REALMS  MUZ  (MDIM)  ,VZ  (MDIM,  MDIM)  , ZERO, ONE 
DATA  ZERO,CNE/O.ODO, 1,0D0/ 

IFA  UlT=0 

C  OBTAIN  INITIAL  ESTIMATES-  MLE 

CALL  INITL  (X,M,N,MUZ,VZ, MDIM, NDIM) 

DO  10  1=1, M 
MU  (I)  =  MUZ  (I) 

DO  5  J=  1  ,  N 
5  V  (I,J)  =  VZ  fl.J) 

1C  CONTINUE 

C  ITERATIVE  FORMATION  OF  NEW  ESTIMATES 
DO  100  IT=1 ,IMAX 

CALL  WEIGHT  (X , M ,N , M DIM , NDIM ,C , MO, V , WT , IFA ULT) 

IF  (IFAULT.NE,0) RETURN 
C  LOCATION  ESTIMATE 
DC  20  1=1, M 
MU  (I) =  ZERO 
DO  15  J=  1 , N 

15  MU  (I)  =  MU  (I)  +X  (I,  J)  *WT  (J) 

20  CONTINUE 
C  VAR-COV  ESTIMATE 
DO  40  1=1, M 
DO  35  J=T, K 

V  (I,J)=ZERO 
DC  30  F=1  ,  N 

30  V  (I,  J)  =V  (T,  J)  *  (X  (I,  K)  -MU  (I)  )  *  (T  (J,K)  -MU  (J)  )  *VT  (K) 

V  (I,  J)  =V  (I,  J)  *  (C  ♦ONE) 

IF  (I,  NE«  J)  V  (J,I)  =V  (I,  J) 

35  CONTINUE 
40  CONTINUE 

C  CHECK  CONVERGENCE  CRITERIA 

CALL  CKNFM  ( M , MDIM, ETA , MU, V , IT , IFL AG) 

IF  (IFIAG. EQ. 1)  RETURN 
100  CONTINUE 

C  CONVEPG,  CRTT.  'NOT  MET  IN  "AX  ITERATIONS  DESIRED 
IFAULT=3 
RETURN 
END 
C 
C 

SUBROUTINE  INITL  (X, M, N , MUZ ,VZ , MDIM , NDIM) 

C 

C  FORMS  INITIAL  ESTIMATES-  MLE 
C 

BE AL*8  X  ("DIM, NDIM)  ,MUZ  (MDIM)  ,VZ  (MDIM,  MDIM) 

REAL*R  DENOM, ZERO, ONE, SUM 
DATA  ZERO, ON E/0, 0D0, 1*  9D0/ 

DENCM  =  CPLOAT  (N) 

DO  10  1=1, H 
MUZ  (I)  =72RO 
DO  5  J=1,N 


on  n  noon  non  on 


5  MUZ  (I)  =  "UZ  (T)  ♦*  <1,  J) 

MUZ (I)  =  MUZ  (I)  /DENOM 
10  CONTINUE 
DO  25-  1=1. S 
DO  20  J=I,M 
SUM=ZEF0 
DO  15  K= 1 , N 

15  SDM=SUM*X  (I.K)  *X  (J.K) 

VZ  (I,J)= {SUM-DSNOM*MUZ  (I)  ♦  MUZ(J))/DENQM 
VZ(J,I)=VZ(I,J) 

20  CONTINUE 
25  CONTINUE 
RETURN 
END 


SUBROUTINE  WEIGHT (X, M, N, MOTH, NDI N, C , “0, V, WT , IFAULT) 

FORMS  WEIGHTS  FOB  EACH  SAMPLE  PCTNT 

R5AL*8  X  (MDIM,  NDI.M)  .MU(MDIH)  ,  V  (MDIM,  MDI.M)  ,  WT  (NDIM) 
REAL+8  VINV  (10,10) ,TMP  (10,10) .WKAREA (50) 

RFAL*8  R, SUM, ZERO, TWO 
DATA  ZERO, TWO/0. 0D0, 2. 000/ 

DO  10  1=1, M 
DO  5  J=1,S 
5  TMP  (I,J)  =V  (I,  J) 

10  CONTINUE 

OBTAIN  INVERSE  OF  CURRENT  VAR-COV  MATRIX  ESTIMATE 
LINV2F  IS  IMSL  SUBROUTINE  WHICH  INVERTS  MATRIX  WITH 
SPECIFIABLE  ACCUPACY=IDGT 
IDGT=5 
IEF=0 

CALL  LINV2F  (TMP.M, 1 0 , VI NV , IDGT , W KA PEA , IER) 
IF(IER.SQ.O)  GO  TO  20 
ERRORS  IN  MATRIX  INVERSION 
IF  (IER. GE. 129) IFAULT=1 
IF (IER. E0»  34) IFAULT=2 
RETURN 

END  OF  SECTION  REQUIFED  WITH  IMSL  SUBROUTINE 

20  SOM=ZEEO 
DO  35  J= 1 , N 
R=ZERO 
DO  30  1=1, H 
DO  25  K=1,« 

25  R=E >  (X  (I.J) -MU  (I)  )  *VIN  V  (I,K)  *  (X  (K,J)  -MU  (K)  ) 

30  CONTINUE 
R=R*C/TWO 

IF (R. IT. ZERO) IF A  ULT= 1 
IF (IFAULT. EQ. 1) RETURN 
IF  (R. LT.  1 74.  )  WT  (J)=DEXP(-R) 

IF  (R.  GE.  1  74.  )  WT  (J)  =ZERO 
SUM=SUM*WT (J) 

35  CONTINUE 
DO  40  J=1,N 
40  WT  (J)  =WT  (J) /SUM 
RETURN 
END 
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SUBROUTINE  CKNRM  (M,  MDI 1,  ET  A  ,  *10,  V,  IT  ,  IPLAO) 

CHECKS  CONVERGENCE  CRITERIA 

REAL*8  MU(NDIM)  ,  V  ( MDI M,  MDIM)  , KUSAV  (10)  ,  VS  AV  ( 10 , 10)  ,VNFM(10) 
REAL*8  MUNRH,ZERO 

DATA  ZERO/O.O DO/, MCTMU/0/, MCTV/0/ 

IFIAG=0 

IF(TT.EQ.I)  GC  TO  20 

LOCATION  NORM  =  SUN  ABSOLUTE  DIFFERENCE  OP 

CCNSEC.  ESTIMATE  ELEMENTS 
VAR-COV.  NOR*  =  *AX  ROW  SUM  ABS.  DIFCEPENCE 
CCNSEC.  ESTIMATE  ELEMENTS 

MUNPM=ZE30 
DO  10  1=1, H 

KUNRM=MUNEM*DABS  (KUSAV  (T)  -MO  (I)  ) 

VNRH  (I) =ZERO 
DO  5  J= 1  *  M 

5  VNRM (I)  =  VNRH  (I) + CABS ( VS AV (I, J) -V (I , J) ) 

10  CONTINUE 
MAX  =1 
DO  15  1=1, H 

IF (VNRH (I)  .  GT. VNRM (MAX) )  MAX=I 
15  CONTINUE 

IF  (VNRM  (MAX)  .IT.  ETA)  MCTV=MCTV*1 
IF  (WCM  (MAX)  .  GE.  ETA)  MCTV=0 
IF  (KUNRH.LT.  ETA)  MCT MU= MCT “U+ 1 
IF  (MUNPM.  GE.  ETA)  HCTMU=0 
C  CONVERGENCE  CRITERIA  MET  ON  3  CONSEC.  ESTIMATES 
IF  (KCTV.GT.  2.  AND.  MCTMU.GT.  2)  IFLAG=1 
C  SAVE  CUFPENT  ESTIMATES 
20  DO  30  1=1, M 
MUSA V  (I)  =MU  (I) 

DO  25  0=1, M 
25  VSAV  (I  ,  J)  =  V  (I ,  J) 

30  CONTINUE 
RETURN 
END 


